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Abstract 

We study the errors brought by finite volume effects and dilution effects on the practical 
determination of the count probability distribution function Pif[n,l), which is the probability 
of having N objects in a cell of volume for a set of average number density n. Dilution effects 
are particularly relevant to the so-called sparse sampling strategy. This work is mainly done 
in the framework of the scaling model (Balian & Schaeffer 1989), which assumes that the Q- 
body correlation functions obey the scaling relation ^Q(Xri, Xrq) = X~^'^~^')''^^(ri, ...jTq). 
We use three synthetic samples as references to perform our analysis: a fractal generated by a 
Rayleigh-Levy random walk with ~ 3.10'* objects, a sample dominated by a spherical power-law 
cluster with ~ 3.10'* objects and a cold dark matter (CDM) universe involving ~ 3.10^ matter 
particles. 

The void probability, Pq, is seen to be quite weakly sensitive to finite sample effects, if 
PoV£~^ ^ 1, where V is the volume of the sample (but Pq is not immune to spurious grid effects 
in the case of numerical simulations from such quiet initial conditions). If this condition is 
met, the scaling model can be tested with a high degree of accuracy. Still, the most interesting 
regime, when the scaling predictions are quite unambiguous, is reached only when tiIq ^ 30 — 50, 
where Iq is the (pseudo-)correlation length at which the averaged two-body correlation function 
over a cell is unity. For the galaxy distribution, this corresponds to n ^ 0.02 — 0.03/i^ Mpc~^. 

The count probability distribution for N is quite sensitive to discreteness effects. Fur- 
thermore, the measured large N tail appears increasingly irregular with N , till a sharp cutoff 
is reached. These wiggles and the cutoff are finite volume effects. It is still possible to use 
the measurements to test the scaling model properties with a good accuracy, but the sample 
has to be as dense and large as possible. Indeed the condition nig ^ 80 — 120 is required, or 
equivalently n ^ 0.04 — 0.06/i^ Mpc~^. The number densities of the current three dimensional 
galaxy catalogues are thus not large enough to test fairly the predictions of the scaling model. 
Of course, these results strongly argue against sparse sampling strategies. 

subject headings: galaxies: clustering - methods: numerical - methods: statistical 
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1 Introduction 



The observed galaxy distribution is generally admitted to be homogeneous at scales larger than 
~ 100h~^ Mpc (with Hq = lOO/i km s~-^ Mpc"-^). At small scales, however, it is strongly clustered. 
The measured two-body correlation ^2('') indeed exhibits a power-law behavior. 



(Totsuji & Kihara 1969, Peebles 1974), on a large scale range (0.1 ^ r ^ lO/i"^ Mpc). The 
measurement of higher order correlation functions ^g(ri, rg) seems moreover to indicate that 
they can be hierarchically decomposed as sums of products oi Q — 1 terms in ^2 (Groth & Peebles 
1977, Fry & Peebles 1978, Davis & Peebles 1983, Sharp et al. 1984) at least up to Q = 8 (Szapudi 
et al. 1992). This hierarchical model is a particular case of the more general scaling relation (Balian 
& Schaeffer 1988, 1989a, hereafter BS) 



It is commonly believed that the main source of galaxy clustering is gravitational instability. Both 
theoretical (Davis & Peebles 1977, Peebles 1980 hereafter LSS, Fry 1984a, Hamilton 1988, Balian 
& Schaeffer 1988) and numerical arguments (Efstathiou et al. 1988, Bouchet et al. 1991, hereafter 
BSD, Bouchet & Hernquist 1992, hereafter BH) indeed suggest that a system with gaussian initial 
fluctuations reaches a scale invariant behavior in the non-linear regime (^2 ^ !)• The resulting 
hierarchy of correlations naturally depends on the initial conditions power spectrum. In the weakly 
non-linear regime (^2 ^ 1)) perturbation theory shows that a similar hierarchy appears with oc 
(LSS, Fry 1984b, Goroff et al. 1986, Grinstein & Wise 1987, Bernardeau 1991, Juszkiewicz 
et al. 1992), but the correlations hierarchy (i.e. the constants of proportionality) may be of course 
different from the one obtained in the highly non-linear regime (see, e.g., Colombi et al. 1994, 
hereafter CBS, Lucchin et al. 1994). A change in behavior is therefore expected at scales close 
to the correlation length, which does not seem to be the case at low Q in the three dimensional 
observed galaxy catalogs (Bouchet et al. 1991, 1993, Gaztahaga 1992). Lahav et al. (1993, but 
see also Matsubara & Suto 1994) have recently argued that this could be due to projection in 
redshift space effects. Hence, a detailed study of the scaling model is an important stage of the 
understanding of large scale structure dynamics and statistics. 

It is tempting to assume that the property (2) holds for any Q and to see what are the conse- 
quences for the galaxy distribution. However, the direct measurement of the (J-body correlation 
functions becomes practically difficult and quite noisy when (5^5. Since the relation (2) reflects an 
homothetic property, it suggests to only look at the scaling behavior of the underlying distribution 
and forget the angular dependence of the correlation functions. The corresponding statistical tool 
is precisely the count probability distribution function Pi^{n,l) (hereafter CPDF). It measures, in 
a discrete set of average number density n, the probability that a cell of volume v = (or of size 
I), randomly thrown in the set, contains N objects. 

The CPDF is indirectly related to the averaged correlations $q{1), defined by 




(1) 



^g(Ari, Arg) = A-W-i)^^Ar(ri, rg). 



(2) 




(3) 
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Indeed, its generating function 'P(A) = A P^r can be written (see, e.g., BS) 

J'(A) = exp{E(=^!)^|,m}. (4, 

The CPDF is simple to measure. It has several applications. The evaluation of the moment of order 
two of the CPDF, related to the averaged two-body correlation, indirectly estimates the effective 
clustering of the system as a function of scale and can be used to check the large scale homogeneity 
of the galaxy distribution. The moment of order three, related to the skewness of the distribution, 
is used to test gravitational instability in the weakly non linear regime (Juszkiewicz & Bouchet 
1991, Juszkiewicz et al. 1993). The moment of order q (^q real) of the CPDF provides indications 
on the multifractal behavior of the system (Balian & Schaeffer 1989b, Colombi et al. 1992). 

Several models have been proposed to predict the shape of the function Pi^{n,l) and we quote 
here the most important ones. The oldest one is certainly the log-normal distribution (studied in 
detail by Coles & Jones 1991), since Hubble had already noticed in 1934 that it provided a good 
fit of the counts in cells measured on the projected galaxy distribution. Saslaw & Hamilton (1984), 
using a thermodynamic approach based on an assumption of statistical equilibrium, reached an 
analytical model where the CPDF depended on only one parameter 6(£). It was also seen to give a 
good description of the measured CPDF in the galaxy distribution (Crane & Saslaw 1986, 1988). 
Fry (1986) used the negative binomial model to fit the void probability of the galaxies distribution. 
This model was proposed by Carruther & Shih (1983) to explain particles multiplicities in high 
energy collisions. More recently, Balian & Schaeffer (BS), computed analytical predictions on the 
CPDF, assuming that the scaling relation (2) holds for any Q. They found that the function 
Pl^{n,l) should exhibit non trivial invariance properties that we now recall. 

Consequences of the scaling relation have first been studied on the void probability Pain, I) 
(hereafter VPDF) by White (1979) and Schaeffer (1984). They showed that in case the scaling 
relation applies, the function 

^(-•^)--^ (^) 

should then depend only on the characteristic number of objects inside a cell located in an 
over dense region: 

a{n,l) = cj{N,), (6) 

with 

iVc = nei^. (7) 

This invariance property was successfully tested on the observed galaxy distribution (Sharp 1981, 
Bouchet & Lachieze-Rey 1986, Maurogordato & Lachieze-Rey 1987, Fry et al. 1989, Maurogordato 
et al. 1992), which suggests that equation (2) can indeed be generalized for all Q. Evaluating 
the VPDF in three numerical simulations, with cold dark matter (CDM), hot dark matter and 
white noise initial conditions, BH measured a slight disagreement with the scaling relation, but 
concluded that it could be attributed to misleading effects. Indeed, the apparent deviation was 
too small in comparison with the possible systematic and other ill-constrained errors. Vogeley et 
al. (1992) measured the VPDF on the extended CfA2 catalog, and claimed they find a significant 
deviation from the scaling relation at large scales. However, they did not properly take into account 
systematic errors due to finite sample effects, as it will be discussed in § 3. Before far-reaching 
conclusions are accepted, one indeed has to evaluate all the errors due to the unavoidable limitations 
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of any realistic sample. Conversely, one should ask the following simple and practical questions: 
does a sample that should not verify the scaling relation always significantly disagree with equation 
(6)? Does a physical realization of an ideal scale invariant distribution always exhibit this property? 

Similarly, as for the VPDF, if was found by BS that if the scaling relation applies, the function 
Pl^{n,l) scales in the strongly non-linear regime (^2 ^ 1) universal function h{^x) that only 

depends on one variable x = N/N^ instead oi N , n and I. More specifically, let us define the 
function h(^N,n,l) by 

h{N,n,i)=^PN{n,i). (8) 

Then, if equation (2) is verified and if ^2 ^ 1) ^^e function h can be written, in a certain regime, 

h{N,n,i)c^ h{x), x = N/N^. (9) 

The distribution function h(^x) is not arbitrary. It should present some asymptotic behaviors for 
cc <C 1 and x ^ 1, that we will detail in § 4. Its moments of order Q are proportional to the ratios 

Sq{1) ^ =1^, (10) 

that are constants with respect to scale if the scaling relation applies. Furthermore, the function h 
is related to cr^y) by the following transform 

C7{y) = - r{l-e-y^)h{x)dx, (11) 

y Jo 

so the functions cr^y) and h(^x) theoretically involve the same information (but, practically, their 
determination is complementary, see, e.g., Bouchet et al. 1991, hereafter BSD, BH). 

The invariance property (9) was seen to be verified in observational three dimensional galaxy 
catalogs (Alimi et al. 1990, Maurogordato et al. 1992). But the measurements were quite noisy, 
because of the smallness of these samples. In the much richer sets of points coming from iV- 
body simulations, the relation (9) seems to be fulfilled with a great accuracy (BSD, BH) and the 
measured function h has the expected asymptotic behaviors. However, testing the predictions of 
the scaling model on the CPDF is much more difficult than on the VPDF. Indeed, significant 
deviations from these predictions can be expected, even if the underlying distribution is perfectly 
scale invariant. Equation (9) is only valid in an asymptotic regime (never reached in practical cases) 
and for ideal sets of infinite volume. Because of the finiteness of the real samples size, the very large 
N part of the count probability is unduly dominated by a few large clusters and therefore always 
presents a behavior incompatible with equation (9) (BSD, CBS). One consequence is that the direct 
measurement of the low-order moments is not realistic (Colombi & Bouchet 1991, CBS). Since one 
of the validity conditions of relation (9) (detailed in § 4) requires to be in the continuous limit, it 
can be expected that galaxy catalogs, dominated by discreteness, hardly reach this regime. In this 
case, one could thus wonder if the measurement of a function h is really significant. This should 
be even worse for galaxy samples generated with a sparse sampling strategy (used to optimize the 
measurement of the two-body correlation function. Kaiser 1986). Also, the limit where a sample 
really disagrees or not with the scaling property has to be clearly defined, and dilution effects have 
to be carefully studied. 

In this paper, we aim at first to list all the spurious effects that can bring systematic errors on 
the measurement of the VPDF and the CPDF. To be in the framework of the scaling model of BS, 
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Figure 1: A thin slice of width Li,ok/8 extracted from our three samples of reference, i.e., the CDM 
sample E (left panel), the spherical power-law cluster plunged into a poissonian noise (middle 
panel), and the fractal generated by a Rayleigh-Levy random walk (right panel). 

we analyze the functions &(^n,l) and h(^N,n,l) rather than the VPDF and the CPDF. Our second 
objective is indeed to study the practicability of the very sophisticated formalism of BS. Hence, 
we will give the appropriate procedure to trustfully check the existence of a function cr^y) and a 
function h(^x), and of course to determine these functions. Through the analysis of dilution effects, 
we will see that a sparse sampling strategy tends to reduce the scaling regime from which the more 
important informations on the VPDF and the on CPDF can be extracted. 
To do that, we shall use three samples as reference cases: 

1. a CDM universe (left panel of Fig. 1) generated by a P'^M simulation involving 262 144 dark 
matter particles (Davis & Efstathiou 1988), 

2. a cubical sample containing a power-law spherical cluster immersed in a poissonian noise 
(central panel of Fig. 1), that does not obey the scaling property (2), 

3. a fractal generated by a Rayleigh-Levy random walk (right panel of Fig. 1), which should, on 
the contrary, perfectly obey the scaling relation. 

This paper is organized as follows: in § 2 we describe the above three samples. In § 3 we study 
the void probability and the function a. We try to evaluate the main misleading effects that can 
affect its measurement and to propose a procedure to determine fairly the function a, if it exists. 
Section 4 is devoted to the function h. We analyze in detail the predictions of BS for this function 
and we test the behavior of the system when it is diluted. We aim to see when a function h might 
exist and what are the possible contamination effects that may hide it, such as finite volume effects. 
Conclusions are presented in § 5. 

2 The samples 

2.1 The CDM sample E 

Our CDM sample (left panel of Fig. 1) was generated by Davis & Efstathiou (1988) with a P'^M 
code. It contains iVpar = 262 144 matter particles, its physical size is ana 
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it was evolved until the variance computed in a sphere of radius 8 h ^ Mpc reached 1/6 with 
b = 2.5. The CPDF and the VPDF were computed by BSD for cubic cells of size I in the 
range —2.6 < log;^o(-^/-^box) < —1.0. These lower and upper limits were chosen in order to avoid 
respectively smoothing and "periodisation" effects. In the following we shall express all lengths in 
units of £box- 

BSD have measured the functions a and h in the non-linear regime (^2 > particular, 
they found that h did indeed scale as a function h. They proposed a phenomenological fit for 
the function h, that we shall recall in § 4. They found that the transform (11) applied to this fit 
reproduced well the measured function a. However, all this work was done at constant number 
density n = 262 144. A full test of the scaling model needs n to vary. Moreover, the current 
three-dimensional galaxy catalogs involve at most a few thousand objects. So to really test and 
understand the scaling model and its domains of use, it will be interesting to dilute our CDM 
sample that is known to exhibit the scaling properties predicted by BS. 



2.2 The sample dominated by a cluster Ec 

If the statistics is dominated by a single spherical and locally poissonian cluster with a power-law 
average profile, the low-order correlations do not obey the scaling relation (2). Indeed, following 
LSS, let us consider a cluster of radius R for which the number density is 

n{r) = Ar-^, r < R, 

n{r) = 0, r>R, 1.5 < a <3. ^ ' 

According to LSS, when r <C -ff, the correlation function ^ verifies ^2 oc r'^~^" and the low-order 
correlations scale as (Peebles & Groth 1975) 

|ocr«-3, llocr^M. (13) 

So the scaling relation is not obeyed (because a 7^ 3). We have synthesized such a cluster, but for 
reasons of normalization and in order to get a more realistic Pn{1), we have immersed it into a 
white noise: the density profile is then given by 

n{r) = Ar-^, r < R, 

n{r) = AR-"", r > R. ^ ' 

The cluster is located at the center of the sample, which is cubical, of size £box = 1- The normal- 
ization < n(^r) >= n implies 

The calculation of the two-body correlation function gives, when r <C -ff, 

6(r) ~ 2'KlA^n-^r^-^'^, (16) 

with 

I = J x^-'^dx J diJ,{x^ + l + 2xiJ,)-°'^\ (17) 
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Figure 2: Quantities Sq = ^ versus scale in logarithmic coordinates as measured in for 

Q = 3,4,5. 5g increases with (J. The dashed lines are power-laws of index ((J — 2)(a — 3) (eqs. [12], 
[13]). 



Once a = (3 + 7)/2 is fixed, the choice of the number nL^^^ of objects in the sample and of the 
correlation length tq determines A and R. We have taken 

7 = 1.8, ro = 0.087, = 32768, (18) 

which leads to a = 2.4, A = 1.7 10"^^ and R = 0.19. 

The corresponding set is displayed in Fig. 1 (central panel). Since the white noise part of 
Ec is not correlated, expressions (13) are still valid. This is illustrated by Fig. 2, which displays 
quantities Sq, Q = 3,4,5 as functions of scale. The function Pn{1), as measured on E^ for cubic 
cells of size I in the non-linear regime (^2 ^ 1) —2.6 < log;^o(-^) ^ —0.8, is given by Fig. 3. 
At fixed scale, it presents a bump at the vicinity of its maximum, corresponding to the poissonian 
noise that globally dominates the statistics (dashed curves), followed at larger iV by a power-law 
and a second bump above which PAr(-^) vanishes. The large N part of the CPDF is dominated 
by the cluster statistics and can be easily evaluated (CBS). In particular, one can show that the 
power-law part of Pn{^) verifies Pn{^) oc N~^I°^~^ . 



2.3 The Rayleigh-Levy fractal F 

To have a trustable element of comparison, we have synthesized a sample F which obeys the scaling 
property (2). It is a fractal involving 32 768 points generated by a Rayleigh-Levy random- walk (see, 
e.g., Mandelbrot 1975, 1982, Peebles 1980, Colombi et al. 1992). Starting from a random point in 
the sample, the next point is chosen at random direction and at a distance r with a probability 

p{r >l) = l, l< ^p. ^ ' 

The process is repeated taking the new point as reference. To obtain the same two-body correlation 
function as in E^, we chose e = 3 - 7 = 1.2, = 1.24 10"^. We refer to LSS (pages 245, 248) or 
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Figure 3: Logarithm of the count probability PN{i) measured on as a function of log;^o-^) 
various values of scale. The dashed lines correspond to the Poisson model. 

Appendix A to have the expression of ^2('') in terms of e, n and Ip. The finiteness of ^2 is insured 
by the fact that our sample is contained in a cube of size £box and involves a finite number of 
objects (so F is not actually a real fractal). Figure 1 (right panel) shows a slice extracted from F 
of width Li,ok/8. In Appendix A, we show, generalizing a calculation of Peebles for the low-order 
correlations (LSS, page 248), that this sample obeys the scaling relation (see also Hamilton & Gott 
1988). We compute its function a and find the approximate expression 

(7(2/) ~ (1 + 2//2)-\ (20) 

The function h associated to equation (20) is 

/i(a;) ~ 4exp(-2a;). (21) 

As for Ec, we have measured the count probability for cubic cells of size I in the scale range 
— 2.6 < log;^o(-^) ^ —0.8. This sample is especially interesting, since we can exactly test on it the 
predictions of the scaling model. We are able to see if they are practically verified, despite the 
possible misleading effects we describe in the following. With this sample and E^, we have two 
extreme reference cases to decide if a function ct or a function h exists or not. 

3 The void probability (VPDF) 

Strictly speaking, the notion of VPDF is meaningless for a continuous matter density field. Never- 
theless, we can consider each matter particle of our CDM sample as a galaxy, as we shall do in the 
following. Then, one can measure the quantity &(^n,l) defined in introduction as a function of 
for various number densities n to have an idea of the behavior of Po(^n,l). For example, the first 
step is to see if the sample deviates from a pure poissonian distribution, for which (7=1. Moreover, 
in this system of coordinates, if the scaling relation (2) applies, a should scale as a single function 
c7(iVc). One may however argue that a large class of samples may exhibit a function a(^Nc), even 
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if they do not obey the scaling property. The two samples and F we have generated are here 
to specifically address this problem. Some artificial deviations from the scaling relation may be 
induced by misleading effects, such as "grid effects" (taking place only in numerical simulations) 
and finite sample effects, that are studied in detail in Appendix B. We give the main results in 
§ 3.4. The scaling relation may also work only in a finite range of scales, which does not exclude 
the existence of a function (T(iVc) if only this regime is taken into account. 

This section is thus organized as follows: in § 3.1, we recall some aspects of the formalism of 
BS. Section 3.2 studies and compares two ways of randomly diluting a sample, an analytical one 
and an experimental one. In § 3.3, we measure function & in and F. Section 3.4 deals with 
spurious effects. In § 3.5, we study the scaling behavior of the function a measured in our CDM 
sample E. 

3.1 Scaling model and VPDF 

Here we assume that the scaling property (2) applies. It is argued in BS that the function (T(iVc) 
should have a power-law behavior at large N^, i.e., 

a(N^ (X aN-'^ (22) 

with < w < 1 and a > 0. The measurement of a in the observed galaxy distribution is in 
agreement with equation (22). The CfA data provide oj ~ 0.5 ± 0.1 (Alimi et al. 1990) and the 
Southern Sky Redshift Survey (SSRS) data lead to w ~ 0.7 ± 0.1 (Maurogordato et al. 1992). In 
universes of matter coming from numerical simulations, the value of oj seems to depend on initial 
conditions (see, e.g., BH). In our CDM sample, BSD have measured (for the distribution of matter 
particles, whereas the previous values concern the observed galaxy distribution, which is expected 
to be biased with respect to the matter distribution) 

w~ 0.4 ± 0.05. (23) 

Let us define the number iVv by 

Po = exp(-iV,i-'^), uj<l, (24) 

and iVv = for oj = 1. The number iVv can be considered as the typical number of objects in a cell 
located in an underdense region. The typical size of a void naturally appears from the condition 

iVv(4) = 1. (25) 



3.2 Dilution and VPDF 

The function a = —\tl(^Po) / {nl^) in principle depends on two variables, i.e., the average number 
density n and the scale I. To check if the scaling relation is fulfilled, that is if a{n,l) = a(^Nc), it is 
useful to display it as a function of for various number densities n, to cover all the dynamic range 
of possible values of {n,l). Samples of various densities can be obtained by randomly extracting 
from the studied sample E some subsamples El^^ {i is the number of objects) with average number 
densities Ui = n.i/Npar- These subsamples should have the same shape and the same volume 
than E, since each of them is formally a discrete realization of the same underlying density field 
than E, but with a number density smaller than in the set E. However, in observational data 
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samples, this experimental dilution is often made by extracting several volume limited samples, so 
this condition is not verified, and some artifacts due to the variation of the subsamples physical 
size can contaminate the measurement. 



On the other hand, if the function Pi^{n,l) of E is known, this random dilution can also be 
done analytically (see Hamilton 1985, Hamilton et al. 1985). Indeed, one can easily show, using 
equation (4), that the CPDF PN{n,i), iV > 1 can be obtained from the VPDF Po{n,i) through 
the following derivation (White 1979, BS) 



One can moreover compute the count probability Pi^{ni,l) of the subsample from Pi^{n,l) by 
applying equation (26) to equation (27): 



with C§ = K[/[N[{K - Ny.]. The series expansion (27) converges at least for n^/n < 2; the VPDF 
of a "subsample" denser than E can therefore be predicted, with increasing error for increasing 
density. This error arises especially from the finiteness of the sample, which prevents function P^ 
from being accurately calculated at large N . 

Figure 4 shows it as a function of N^, as measured in our CDM simulation (triangles) and 
two randomly extracted subsamples -B32768 (squares) and -B1024 (pentagons). The dashed curves 
represent analytical dilutions on E and on -B32768 with respective dilution factors nln^27&& = 8 and 
^32768/^1024 = 32. They quite superpose to the squares and the pentagons, as expected. This 
measurement shows that, even if the number of points in the considered catalog is small, the direct 
determination of the VPDF in this set is accurate (in the available dynamic range). In other 
words, errors related to statistical poorness are small. Indeed, the analytical dilution (27) uses the 
statistics of the full set E and should thus be more accurate than the experimental dilution. 

One can similarly test equation (27) with niju = 2. Figure 5 displays it as a function of 
scale for -B1024 (pentagons) and (hexagons). (We use this system of coordinates to easily 

distinguish between the two subsamples). The dashed curve represents the function a as it should 
be measured on -B1024) S'PPlyi^g equation (27) to with rii/n = 2. Again, the agreement between 
the measurement and the prediction is good, although the two subsamples are statistically poor. 

Note that the available dynamic range where the function a significantly differs from unity 
of course decreases with decreasing number density n. Now, the interesting properties of a scale 
invariant distribution are reached at large (eq. [22]) which of course argues against a sparse 
sampling strategy. We shall discuss again this problem in § 3.5. 

3.3 How significant is a measurement of the function a7 

Let us now see if at a first glance, the measurement of & gives appropriate results for our reference 
sets Ec and F. Figure 6 shows tr as a function of N^, as measured on E^ (left panel) and on F (right 




(26) 



This relation implies, by a simple Taylor expansion 




(27) 




(28) 
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Figure 4: tr = —lnPo/(^nl^) as a function of = nl^^^ logarithmic coordinates. The triangles 
refer to our CDM sample E, the squares and the pentagons to the randomly extracted subsamples 
-^32768 -^1024 respectively. The dashed curves correspond to an analytical dilution on E and 
-^32768 ^ factor 8 and a factor 32 respectively; they should superpose to the squares and the 
pentagons. 
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Figure 5: log;^o as a function of logarithm of scale in the very dilute regime. Pentagons correspond 
to -B1024 hexagons to E\^. The dashed curve represents a virtual increase in number density 
by a factor 2 applied to E\^. It should superpose to the pentagons. 
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Figure 6: Logarithm of it as a function of logarithm of for (left panel) and F (right panel). 
The filled points correspond to the direct measurement, the stars on left panel to various analytical 
dilutions varying from a factor 8 to 1024, and the open symbols to a virtual multiplication of 
average number density by a factor 2. The circled points on left panel correspond to scales where 
Po{l) < l^. The dotted-dashed curve on right panel is the theoretical expectation. 



panel), for several number densities Ui = 2n,n,n/8,n/39.5,n/512,n/1024 (that we also test on our 
CDM sample E), using the analytical procedure given by equation (27). The stars (on left panel) 
correspond to the cases Ui < n, the filled points to the direct measurement, and the open symbols 
correspond to the case Ui = 2n. As expected, the fractal exhibits an unique function a(^Nc), that 
is quite identical to the theoretical expectation (20). The set E^ does not scale as a function (T(iVc) 
as well as F. Its function & presents a plateau where tr ~ 1 and a sudden cutoff at ^ 10. The 
plateau is not surprising, since E^ is dominated by the poissonian distribution that surrounds the 
central cluster. The main difference with a pure poissonian sample is the existence of a large range 
of values of N^, which is provided by the presence of the cluster. A poissonian sample, that is not 
correlated, has = 0. The points that verify Po{l) < are circled in left panel of Fig. 6. They 
precisely correspond to the sharp cutoff at large N^. As we shall see in next section, this cutoff does 
not reflect the intrinsic properties of the underlying distribution, but some special properties of the 
largest void contained in this particular realization. The set F has much larger empty regions, so 
none of the studied scales are susceptible to exhibit such an abnormal behavior. Therefore, with 
regard to E^ and F, a careful measurement of function & seems to really reflect the underlying 
behavior of the studied distribution. 



3.4 Misleading effects 

We now consider possible misleading effects on the VPDF. At first we discuss "grid effects" that 
may take place in numerical simulations. Then, we look at finite sample effects on the VPDF 
and on function a. This section ends with a discussion on a recent measurement of function & by 
Vogeley et al. (1991) on the CfA catalog. We also see what happens to our CDM sample E. 
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3.4.1 Grid effects 



Grid effects only exist in numerical simulations. They are linked to the fact that the information 
associated to the grid of initial particle positions in the simulation has not been completely destroyed 
in large underdense regions, where there is no shell-crossing. Their consequence, as shown in 
Appendix B.l, is that the VPDF is artificially underestimated at large scales. They are negligible 
when 



or equivalently when iVv ^ 1 in the formalism of BS. The scales for which Pq < 1/e cannot be 
trustfully tested and have to be removed. This decreases the available dynamic range in which the 
scaling of the function & scales as a function a can be tested. These effects do not exist in the 
galaxy catalogs. 

3.4.2 Finite sample effects and the VPDF 

Practically, the measurement of VPDF is done by randomly throwing a certain number Ctot of cells 
of volume in the sample. It is then easy to show that the standard deviation on Pq is related to 
Ctot (see for instance Hamilton 1985) through the following expression 



But, as already discussed for example by Maurogordato & Lachieze-Rey (1986), since the sample 
is of finite volume, the number of statistically independent cells Ctot is not arbitrarily large and 
may depend on average number density n and on scale I. 

For a pure poissonian distribution (not correlated), a natural guess is simply 



which roughly gives the number of "statistically independent" cells of volume v = contained in 
the sample. This estimation is valid for a moderately small VPDF (see Appendix B.2.1). In the 
case the VPDF is very small, a correction to equation (31) is needed and Ctot is rather of the order 
of 0.07Vsamplei~^{ni^)^ for spherical cells (see Appendix B.2.1, B.2.2). 

The case of a correlated set is more complicated. Indeed, some correlations at scales larger than 
the sample size are then likely to affect the measurement of the VPDF. It is not easy to see where 
they intervene in equation (30). Our aim here is to try to clarify this situation. However, we take 
a slightly different approach from the one used by previous authors, which was generally based on 
equation (30) (with a somewhat uncertain Ctot)- 

The technical issues are detailed in Appendix B.2, where we evaluate the error associated to 
finite volume effects on the VPDF, assuming that the hierarchical model applies, i.e., that the 
(J-body correlation function can be written as sum of products oi Q — 1 terms in ^2- We think that 
the result can be reasonably generalized for any sample which does not disagree "too much" with 
the hierarchical model (an independent estimation using the top hat model leads to similar results, 
see Appendix B.2). The error on the VPDF is roughly approximated by 



Po ^ 1/e, 



(29) 




(30) 




(31) 








S2 



2(TC£^)^(T'^2(-^sample), 



(32) 
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where tr' < stands for the partial derivative of function &{Nc, n) with respect to and ^2(-^sample) 
symbolizes the double integral of ^2(ri, r2)/Fg^jnple sample volume Fsample ~ -^Lmple- 

The first right hand side term Si of equation (32) can easily be understood. It is equal to the 
right hand side of equation (30), with Ctot given by equation (31). It therefore simply corresponds 
to the expected "poissonian" error due to the fact that the number of statistically independent 
cells of volume v = contained in the sample is finite. As it is the case for a pure poissonian 
sample, a correction to Si is needed when the VPDF is very small (see appendix B.2.1, B.2.2). For 
practical purposes, we neglect this correction. Indeed, when it has to be taken into account, the 
relative error |APo/Po| is likely to be close to unity or larger. An other factor a depending on the 
way the system is clustered [i.e., on 7 and on the shape of function cr^y)] has also been neglected 
in the writing of Si, since it is numerically seen to be of order unity within less than a magnitude 
(and it is of course exactly equal to unity in the poissonian case). The second right hand side term 
S2 of equation (32) is brought by fluctuations of the underlying density field at wavelengths larger 
than the sample size, which lead supplementary correlations. 

We see that, whatever the value of the second right hand side term, there is a scale ^cut(^) 
above which the measurement of the VPDF is not statistically significant. At this scale, which is 
defined by 

Po(ri,4ut)%^^l, (33) 

^cut 

there is typically only one independent empty cell. li I ^ i-cut, the VPDF is dominated by the 
largest void of the sample. In a set of infinite volume and with an infinite number of objects (but 
a finite number density n), we should have an infinite distribution of voids of arbitrary size. This 
is not the case for a realization of finite volume, in which the size of the largest void is necessarily 
smaller than the size of the sample. 

If the formalism of BS applies, S2 can be rewritten, for ^ 1, 

S2 = 2a;7i£3(7(iVc)^^&^^ = -2a;ln(Po)^^&^^. (34) 
^2 ^2 

One can thus expect this term to be in general small for moderately small VPDF. The poissonian 
error Si also becomes rapidly very small if I gets small as compared with ^cut- 

3.4.3 Finite sample effects and function a 

We now wish to estimate the finite sample error on the quantity & = — ln(^Po) / (^nl^) . Assuming that 
the uncertainties APq/Pq and An/n are small and that the indicators Pq and n are statistically 
independent, we might use the standard errors propagation formula and write 

AaV 1 /APn\2 fAn\ 

(35) 



<7 J {nP&y V Po 7 \ n 
where 

An 



I = ~~rr 1- ^2 (-^sample)- (36) 

Ti J Ti V sample 

However, the VPDF and the number density n are strongly correlated. If n increases, then Pq 
decreases. In other words, equation (35) is likely to overestimate the real uncertainty on a . It is 
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therefore certainly more realistic to use the following estimate 



Act 

(7 



1 APn An 



(37) 



This writing implies that, as expected, Aa/a ~ when Pq is close to unity (or, equivalently, when 
nl^a ~ 0), which was not the case with the more approximate evaluation (35). 
When I < 

•^cut) it is easy to see that 

^ S ^. (38) 

a n 

This error should therefore be small if the size of the studied sample is large in comparison with 
its correlation length and if raFgample ^ Ij which is the case in current observational data volume 
limited subsamples, that involve at least a few hundred objects. 

In fact, when one displays it as a function of iVc, a supplementary (and quite larger) error is 
brought by the uncertainty on the determination of Nc = ni^^2 (see Hamilton 1993 for a detailed 
study of the errors on the two-body correlation function). Indeed, the function ^2 is very sensitive to 
finite volume effects and is likely to be quite underestimated at large scales (CBS). The consequence 
is that the measured reaches its maximum much sooner than expected. In Appendix B.3, we 
estimate the error linked to the finiteness of the sampled volume on function ^2- For a set obeying 
to the scaling property, one obtains 

(^^y ^?2( We). (39) 

This expression is valid when the considered scale I is small enough compared to the sample size 
-^sample (see Appendix B.3). (Note that the relative error on is slightly larger, since one has 
to take also into account the error on the average number density n, which is small in comparison 
with the uncertainty associated to the two-body correlation). In addition, we must not forget the 
error on ^2 associated to the sample discreteness (i.e., that the number of sampled pairs is finite, 
see, e.g., Peebles 1973, LSS, p. 189 and Hamilton 1993), that we do not estimate here. 



3.4.4 Comments 

When Vogeley et al. (1991) measure the function a in the CfA catalog, they conclude that there 
is a deviation from the scaling relation at i > 10h~^ Mpc. But at such a large scale, the measured 
number is, as discussed above, likely to be underestimated. Moreover, the measured function & 
becomes an increasing function of I because the sample is dominated by its largest void (Appendix 
B.2.2). In this case, the measurement of Pq is not statistically significant, so no conclusion can be 
reached about the scaling behavior of the underlying galaxy distribution at such scales, not because 
the scaling does not exist, but because the sample is too small. This point has also been discussed 
by Maurogordato et al. (1992). 

With regard to our CDM sample E, we know from CBS that finite volume effects are negligible 
on Nc in the scaling range measured. Such effects should be even less significant on function a. 
Actually, grid effects are here more important than finite volume effects, and the condition Pq > 1/e 
is stronger than the condition I < ^cut- 
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3.5 Extracting the possibly scaling part of the VPDF 

In Fig. 7, we give tr as a function of for E, and several subsamples extracted at random (with 
32768, 6634, 1024 and 512 objects). Open triangles represent a virtual increase of the average 
number density of £^ by a factor 2. The points for which Pq < 1/e have been removed, although 
grid effects should be less significant for randomly diluted subsamples, particularly if the dilution 
factor is large. Indeed, the information linked to a grid is partially destroyed by random dilution. 
The deviation from the scaling, even if it still exists, is much less apparent than in Fig. 4. But as 
noticed above, removing the scales for which Pq < 1/e decreases the available dynamic range and 
improves the apparent existence of a function a(^Nc). 

We also know from the measurement of the low-order averaged matter correlations (CBS) in 
this sample that there should be a deviation from the scaling relation when log;^o-^ > —1.6, because 
of the transition around Iq between the highly non-linear regime and the weakly non-linear regime, 
that exhibit here different values of Sq for Q < 5 (see also § 4.2.1 hereafter). To measure the 
function a, if it exists, one must at first select the scale range igcaling where the scaling relation 
is expected to apply. The measurement of the low-order correlations provides at least the upper 
bound of iscaling (the lower bound of igcaling is difficult to evaluate in numerical samples, because 
at very small scales, the low-order correlations are affected by discreteness and also by numerical 
errors linked to the smoothing of the forces in the program solving the equations of motion). This 
measurement must be done with care, since even the low-order correlations are quite sensitive 
to finite sample effects (CBS). In Fig. 7, the points that verify log;^o-^ ^ —1.6 are circled. They 
unambiguously define a single function a(^Nc). This proves that the measured low-order statistics 
is here in agreement with the measured high-order statistics (function a is related to the behavior 
of the averaged correlations at any order), as far as the scaling relation is concerned. 

The dotted-dashed curve in Fig. 7 is the function (T(iVc) predicted by the transformation (11) 
applied to the function h(^x) measured on E by BSD (see § 4.1). It is in very good agreement with 
the measurement (circled points). This proves that with careful measurements, the predictions of 
BS are obeyed in a great detail, even on relations between indicators that describe very different 
regions of the studied sample: the function (T(iVc) tests underdense regions, whereas the function 
h(^x) tests overdense places. 

Note however that the very dilute subsamples (tc < 6634) provide at most iVc ~ 2 when 
logio(-^) — —1-6: at such a N^, the asymptotic power-law behavior (22) of a is not reached, 
although a already significantly deviates from the Poisson expectation (ct = 1). With regards 
to the perfectly scale invariant fractal F, for n = 6634, we have Nc{lo) ~ 40. With smaller values 
of Nc (especially smaller than 10), we would still miss the large- iVc power-law behavior of a. This 
suggests a lower bound on to determine the power-law behavior of a: it has at least to be larger 
than a few tens at the correlation length, that is niQ ^ 30 — 40, or, in terms of galaxy number 
density (4 ~ 2.4ro ~ 12/i"^ Mpc) 

^ 0.02 - 0.03/i^ Mpc-^ (40) 

This average number density is hardly reached by current observational volume limited subsamples 
extracted from data samples, in which n is at most of order O.Olh^ Mpc"'^, or equivalently Nc{io) ~ 
15. This, of course, argues against any sparse sampling strategy. Indeed, since the VPDF is not 
very sensitive to finite volume effects, it is more important to have a small but as complete as 
possible catalog than a large dilute sample to measure the function a. 
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Figure 7: it as a function of in logarithmic coordinates, as measured in our CDM sample E, and 
several subsamples extracted at random. The open triangles represent the measurement on a virtual 
sample twice denser than E, using the analytical prescription (26). The points for which Pq < 1/e 
have been removed. The circled points verify log;^o-^ ^ —1.6. In this scale range, the low-order 
correlations obey the scaling relation. The dotted-dashed curve is the analytical transformation 
(11) applied to the function h BSD have measured on E. 
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4 The count probability distribution function (CPDF) 



In the scaling model framework, it is useful to measure the quantity h(^N,n,l) = Nd^PNin^l) as a 
function of cc = N/Nc, for various values of {n,l). Indeed, when the scaling relation (2) applies, h 
behaves like an universal function function h{^x) in an asymptotic domain Dh. Practically, however, 
this asymptotic regime is never completely reached and one can expect some contamination effects 
to modify the behavior of function h(^N,n,l) so that it does not scale exactly as a function h(^x). 
For example, the high N part of the function Pi^{n,l) presents increasingly large irregularities as 
N grows and a brutal cutoff at finite N , because of the sample volume finiteness, as shown by CBS. 

Now, the CPDF is usually studied through its reduced moments, e.g., the averaged correlation 
functions . Unfortunately, the measurement of such functions is very sensitive to the finiteness 
of the sampled volume. CBS have studied in detail such finite volume effects on the CPDF, and 
concluded they could lead to a systematic, strong, underestimation of the real low-order moments 
of the CPDF with direct measurements. They proposed a method to correct for such defects, or 
at least to estimate fair errorbars. We shall recall below their main results. 

Once all spurious effects are known, one can wonder if the apparent existence of a function h 
is not more or less systematic whatever the studied sample. In such a case, the formalism of BS 
would be useless. Once we will be convinced that this is not the case, we shall be able to see to 
what extent it is possible to detect a function h(^x) in the observed galaxy distribution. 

This section is thus organized as follows: in § 4.1, we give Dh and the main properties of function 
h, that have been computed by BS. We study contamination effects brought by discreteness and 
finite sample effects in § 4.2 (where we recall the main results of CBS). In § 4.3, we measure function 
h on the three reference samples E, and F. By randomly diluting our CDM sample E, we also 
see what occurs when the number density of the sample becomes comparable to which is reached 
in current three-dimensional observational data samples. We then try to determine a criterion on 
the galaxy number density to measure the interesting scaling behavior of function h, as we did for 
the VPDF. 

4.1 Some aspects of the scaling model linked to overdense regions 

Here we recall some predictions of BS based on the scaling relation (2). 

The invariance property (9) is expected to be reached in the scale and number domain Dh given 

by 

Dh = {lc<Klo, iV>l, iV>iVv}. (41) 
Iq ~ 2.4ro is the (pseudo-)correlation length for which the averaged two-body correlation is unity: 

?2(4) = 1. (42) 

The scale 1^ is defined by 

iVc(4) = 1. (43) 

It is the typical distance between two objects in a cluster. When I <C Ic-, the system becomes 
quasi-poissonian, i.e., (j{N^ ~ 1. The condition iV ^ 1 is linked to discreteness effects. If iVv ^ 1, 
the CPDF presents at low N (iV <C N^) an exponential cutoff which scales as a function g. The 
latter is completely determined once oj is known (eq. [22]), so its main interest is to help for the 
measurement of in the case the scaling relation applies. 
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The function h should behave at small cc as a power-law: 



h{x) ~ x"-^, cc < 1, (44) 
and it presents at large x an exponential cutoff 

h{x) ^ ex-p{-\ys\x), cc > 1. (45) 
It must also obey the normalization conditions (see, e.g., BS) 

poo poo 

/ xh{x)dx = 1, / x^h{x)dx = 1. (46) 
Jo Jo 

The CPDF measured in our CDM sample was found by BSD to scale properly and determine a 
unique function h when {N,l) e Dh (see also § 4.3 hereafter). BSD have fitted h with the following 
phenomenological form 

, , , (1 -w)cc'^-2e-l2'»l^ , , 

which, of course, obeys the normalizations (46), follows the small- cc power-law behavior (44) and 
has an exponential cutoff (45) at large cc with \ys\ ~ 0.125. The parameters a ~ 1.8 and u) ~ 0.4 are 
determined by the measurement of (T(iVc) at large (see eq. [22], and also by the measurement of 
function g, see, e.g., BSD). The parameter \ys\ is determined by the measurement of function h at 
large cc. The remaining parameters b ~ 3.6 and c ~ 0.8 are determined by the two constraints (46). 

4.2 Misleading effects on the CPDF and on function h 
4.2.1 CPDF and finite volume effects 

The CPDF, as it is measured in universes of matter coming from iV-body simulations (see, e.g., 
BSD, BH, CBS) or in the galaxy distribution (see, e.g., Alimi et al. 1990), is seen to present, in the 
non linear regime, an exponential tail at large N , with irregularities more and more pronounced as 
N increases and a brutal cutoff at a finite N = N^^x, above which = 0. This is illustrated by 
figure 8, which displays the quantity log^Q N^P^ measured in our CDM sample E as a function of 
N . In current observed three-dimensional galaxy catalogs, for example the SSRS catalog analysed 
by Maurogordato et al. (1992), these irregularities are so pronounced that the exponential tail of 
the CPDF is difficult to detect. CBS have shown that such irregularities are due to the fact that, at 
large N , the CPDF is dominated by a few large clusters in the sample, each bump corresponding to 
a cluster. The last bump presented by each curve (for —2.0 ^ log;^o-^ ^ —1.6) on Fig. 8 corresponds 
to the largest cluster in our CDM sample (see CBS for a detailed modelization and discussion). 
Such irregularities and the brutal cutoff at iVmax are of course spurious. In a larger sample, the 
cutoff and the bumps would appear further down the large- iV tail of the CPDF: the bumps in the 
smaller sample would now be smoothed by statistical averaging. Indeed, in an infinite volume, one 
can find an infinite number of clusters of any sizes. The exponential tail shown up by the CPDF 
at large N should thus be extended to infinity. Of course, the range of the exponential tail that 
brings a substantial contribution to any statistics depends on that statistics, and it does not need 
to be known with precision at arbitrarily large N . It can be modeled in the following way 

Pn{1) ~ A(£)iV''We-'^W^, (48) 
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Figure 8: Logarithm of the quantity N'^P^^l), measured in our CDM sample, as a function of N . 
Each curve corresponds to a given choice of scale. The scale decreases from the top curve (for 
which we have log;^o-^ = —1.0) to the bottom curve (for which we have log;^o-^ = —2.6) with a 
logarithmic step Alog;^o-^ = The smooth lines are the analytical fit given by eq. (48). This 
figure is extracted from CBS. 



with 

= \y,{l)\/N,{l). (49) 

The functions rj(^l) and |2/s(-^)| are expected to slowly vary with scale (and to be constant if the 
scaling relation applies, see § 4.1), and the quantity A(^l) is a normalization factor (the best fit 
values of ?/(£) and |2/s(-^)| can be found in CBS). 

Let us now turn to the (J-body averaged correlation functions ^q. When the order Q increases, 
the weight given to high density regions is larger and larger and the defects described above, partic- 
ularly the cutoff of the CPDF at iVmaxj have stronger and stronger influence on the measurement of 

^Q. Such measurement would be more realistic if one extended to infinity the exponential tail (48). 

— —Q — l 

To illustrate this point, figure 9 displays in logarithmic coordinates the quantities Sq = j 
3 < (5 < 5, as functions of measured in E and in a larger CDM realization Ei, (with a box 
size £box = 90h~^ Mpc). The underlying statistics in E and Ei, is theoretically exactly the same, 
but the set Ei, is of much larger size than E compared to the correlation length and is thus ex- 
pected to be much less contaminated by finite volume effects. The solid curves with triangles and 
squares correspond respectively to direct measurements of functions Sq in the sets E and Ei,. They 
undoubtly disagree between each other. But once finite volume effect in E have been corrected 
by extending the exponential tail at large N of the CPDF to infinity, one obtains a much better 
agreement (dashed curves). The same procedure applied to Ei, does not significantly change the 
results. The above discussion shows that the method proposed by CBS to correct for finite volume 
effects is efficient (more tests are made in CBS to test the viability and the uncertainties of the 
method). Note that, in the sample E, the plateau exhibited by Sq in the highly non linear regime 
(^2 ^ 1) hidden by finite volume effects, before correction. 
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Figure 9: Logarithm of the quantity Sq = 3 < (5 ^ 5, measured in our CDM sample E as 

a function of The order Q increases with Sq. The filled triangles refer to E, the open triangles 
correspond to the same quantity, but corrected for finite volume effects and the squares refer to 
the larger CDM sample Ei,. Consistency between E and Ei, is only insured after correction. This 
figure is extracted from CBS. 

4.2.2 Misleading effects on function h 

Practically, the regime defined by equation (41) is never reached, even in very large samples such 
as our CDM sample, that involve more than 2.10^ points. In three dimensional galaxy catalogues, 
that contain at most a few thousand objects, 1^ is close to Iq (at best of order Iq/IO) and the 
constraint 1^ <^ I <^ Iq is not verified. Then, the asymptotic regime (9) is hardly reached and the 
formalism of BS may be useless in this case. 

Hence, let us assume that we have a sample of finite volume at our disposal, involving a finite 
number of objects and being a realization of a scale invariant underlying distribution. We evaluate 
here deviations from the invariance property (9) brought by the finiteness of the sampled volume 
and related to the fact that instead of equation (41), one practically has 

Ic^l^lo, N^l, iVv. (50) 

To study separately the various misleading effects that can hide function h, we formally write 
(although the two terms A^is and Afi^ explained hereafter can be correlated each other) 

h{N,n,l)=U{N,l,n).h{x), (51) 

with 

U{N,l,n)=[l + Aec,,{N,n,l)].[l+A^^{N,l)]. (52) 

The first factor of equation (52), 1 + Adis, accounts for discreteness effects and for the deviation 
from the scaling behavior of the system in underdense regions (iV ~ iVv). It has been evaluated 
by BSD, assuming that function (j{Nc) has reached its expected power-law behavior (22) at large 
Nc (with the notations of BSD, we have 1 + Adis = l/a^)- Note that the quantity A^is computed 
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by BSD depends on the parameter oj defined in equation (22). In appendix C.l, we study Adis in 
terms of moments of the CPDF (without making any assumption on function a). For example, in 
the vicinity of iV ~ Nc, the quantity 1 + Adis should be roughly of the same order as 

l + Adis(iV,7i,£)~ l + l/iVc + l/?2, N^N,. (53) 

This estimate is only very approximate, because it does not take into account the complex behavior 
of the number Adis a,s a function of iV. Of course, in the limit N ^ 1, N ^ and Iq <C^ I <C^ Iq, 
Adis vanishes. 

The second factor, 1 + Afin, is related to finite volume effects, that are widely discussed in 
previous section. Because of the finiteness of the sampled volume, the CPDF presents irregularities 
at large N followed by a sharp cutoff at iV = N-^a^{n,l), instead of the smooth exponential tail 
predicted at large x for function h (eq. [45]). These effects have been carefully studied by CBS. A 
cutoff at large N on the CPDF changes the overall normalizations. Indeed, 1 + Af^^ is given, in the 
vicinity of Nc, by 

1 + Afin(^',^) ~ 1 /y x^h{x)dx > 1. (54) 

This result is valid only if the two-body correlation is not affected by finite volume effects when 
I Iq, which should be the case of any reasonable sample. When the sample size gets larger, the 
ratio iVmaxZ-^c increases. In the limit of an infinitely large sample, 1 + Af^^ goes of course to unity. 



4.3 Is there a function h or not? 

We can see, from results of previous section, that the measurement of function h, in the framework 
of the scaling model, is quite delicate. When the above defects are important, it may be very 
difficult to distinguish between a sample which obeys the scaling property and a sample which does 
not. Here, we measure function h on the two reference samples Ec, F, and on the more realistic 
set E and its various subsamples obtained by random dilution. Our aim is to prove that a careful 
measurement of function h allows one to successfully test the predictions of BS, if the considered 
sample is rich enough, which is not yet the case for current three-dimensional observational galaxy 
catalogs. 

Following BSD and BH, our practical implementation Dh of the asymptotic domain Dh (see 
eq. [41]) is 

Dh = {iVc > 1.8, l/lo < 0.4, iV > 1, iV > 4iVv, N P^r^ > s} . (55) 

The supplementary last condition comes down to removing the very large N part of the CPDF, 
where this latter is dominated by finite volume effects (c.f. BSD and BH). To enlarge the available 
dynamic range, we will divide when possible the measured function h by the factor 1 + Adis computed 
by BSD, to correct for discreteness effects. This explains why we take iV > 1 instead of iV ^ 1. Of 
course, we can still expect significant differences between function h and the the searched function 
h, due to the factor 1 + Af^^i^N , I) . 

This section is organized as follows: in § 4.3.1, we measure function h in the sets E^ and F. We 
want to check if these two reference samples give the expected results, knowing that F is obeying 
the scaling relation and that E^ does not verify it. In § 4.3.2, we measure function h in the CDM 
sample E, and its various subsamples randomly extracted. The idea is to dilute E to reach the 
same number density n as in current observational galaxy samples. This section is concluded by 
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Figure 10: Function x^h{N,l) = N'^P]s[{l)l{nl^) as a function oi x = N/Nc as measured in the 
fractal F in logarithmic coordinates. Only the values (^N,l)eDh (where Dh is defined by equation 
[55]) have been selected. The dashed curve is the theoretical expectation h(x) = 4exp( — 2a;). 

§ 4.3.3, where we give constraints on n to detect a function h (when it exists) and to fairly measure 
it. 

4.3.1 The reference samples 

The samples and F involve a large number of points iVtot = 32 768. In these sets, we have 
Ic/lo — 0.016, so the effective dynamic range Dh should be large. 

Figure 10 displays the quantity x^h{N,l) {n = 32, 768) measured on F as a function of cc = N/Nc 
for (^N,l)eDh (we take here u) = 1, as suggested by the theoretical calculation of Appendix A and 
the measurement of function (t in § 3.3, so iVv = 0). Note that if we took all the available values 
of (iV,£), the result would be very similar, which explains why we do not show the corresponding 
panel. The function h is unambiguously detected: all the curves have the same shape and superpose 
very well. The dashed line is the theoretical expectation, which is in a good agreement with the 
measurement. 

Left panel of Fig. 11 shows the quantity x'^hi^N ,1) in = 32 768) measured in E^ as a function of 
X = N/Nc- As expected, the curves are not significantly more gathered than in Fig. 3, which is a 
first indication against the existence of a function h. Right panel of Fig. 11 is the same as left panel, 
but only the values of {N,l) that belong to Dh have been taken into account. Here, we have not 
corrected for discreteness, since the value of w = we get from function a (§ 3.3) would call for a 
special treatment that we felt not necessary to write out. We have however taken N > 2 instead of 
iV > 1 to remove the regime completely dominated by discreteness. The scattering of the curves is 
much smaller in right panel of Fig. 11 than in left panel, which is not surprising since the available 
dynamic range has been reduced. This could give the misleading illusion that a function h exist, but 
this scattering is about an order of magnitude, much larger than the expected maximal scattering 
S that can be infered from equation (53). We indeed find log;^o = iiia'X[log;^o(l + ^dis)] ~ 0.2 in 
the scale range given by equation (55). We do not take into account here possible finite volume 
effects (that are in some way an intrinsic feature of this sample that contains only one cluster). In 
other words we assume 1 + Afi^ — 1- The computation of 1 + Afi^ indeed needs to guess a possible 
function h (eq. [54]), which is quite difficult for this particular case, as can been seen on Fig. 11. 
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Figure 11: Function x'^hi^N ,1) = N'^ Pi^i^l) j {nl^) as a function oi x = N/N^ as measured in the set 
Ec in logarithmic coordinates. In left panel, all the available values of {N,l) have been taken into 
account. In right panel, we have selected the values (^N,l)eDh, where Dh is defined by equation 
(55). 

4.3.2 A realistic case: dilution effects 

Figure 12 displays the quantity x'^hi^N ,n,l) as a function of cc = N/N^ for our CDM sample E 
and various diluted subsamples extracted at random, involving from 32 768 to 512 objects. In left 
panels, all available values of {N,l) are represented. In right panels of Fig. 12, only the values 
{N,l)eDh have been taken into account, and function h has been divided by the factor 1 + A^is 
computed by BSD to correct for discreteness effects. The dashed curve on right panels is the 
phenomenological fit proposed by BSD (eq. [47]). 

In E, we have Ic/lo — 0.016, as in E^ and F. To some extent, one can therefore compare E 
to Ec and F. The curves in right panels of Fig. 12 have all the same regular shape and superpose 
quite well. Their scattering is of the same order as in Fig. 10 and hence smaller than in right panel 
of Fig. 11: the existence of a function h is unquestionable. However, when E is randomly diluted, 
the dynamic domain Dh is reduced, as well the range of sampled values of cc = N/Nc- In the 
subsamples -B1024 -^32768) is empty. Thus, when the average number density n decreases, 
the function h tends to behave less and less as a function h, mainly because of the 1 + A^is factor, 
that becomes larger and larger. The consequence is that the measurement lies higher and higher 
above the dashed curve in Fig. 12 (even when one divides the measured function h by the factor 
1 + Adis computed by BSD; this is because the latter is only approximate; see also Appendix C). 

4.3.3 Comments 

The above examples prove to a large extent that, when they are carefully tested, the predictions 
of BS on function h can be discrimated against. But this is true only if 1^ <C lo, or equivalently 
if Nc{lo) ^ 1. The measurements on E and its subsamples show that, to detect a function h, we 
must practically have Nc{lo) ^ 30 — 50, as it was the case to determine function a(^Nc), a condition 
hardly fulfilled by current three-dimensional galaxy catalogs. To fairly determine function h, much 
more information is needed and Nc{lo) ^ 80 — 120 is required. In terms of galaxy number density, 
this corresponds to 

^ 0.04 - 0.06/i^Mpc-^ (56) 
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Figure 12: Logarithm of x'^h(^N,n,l) as a function of log;^o-^ measured on E (a), -B32768 (^)) 
-^6634 (c), -S[o24 (d) and (e)- All the available values of {N,l) are used in left panels. In 



right panels, only the values of (^N,l)eDh have been taken into account. For -B1024 -^512) is 
empty. We have corrected for discreteness effects at small N. The dashed curve is the analytical 
fit of BSD (eq. [47]). A shift increasing with decreasing number density can be observed between 
the measurement and this dashed curve. It is essentially due to the fact that 1^ is approaching Iq 
while the sample is diluted, which reduces the detain Dh where the behavior of function h(^N,n,l) 
as a function h{^N jN^) is expected. 



Moreover, the size of the sample has to be large in comparison with the correlation length, otherwise 
finite volume effects will make the exponential tail presented by function h at large x undetectable. 
Constraint (56) is quite stronger than equation (40), which definitely argues against sparse sampling 
strategies. Of course, the specific case of our CDM sample cannot be generalized to arbitrary set 
of points. The work of Maurogordato et al. (1992) on the SSRS catalogue provides rather good 
indications of the scaling of the CPDF measured in this sample as a function h, although their 
volume limited samples had n at most of order only a few 10~^h~^ Mpc'^. But because ic is close 
to lo, they use a domain D'f^ larger than which is given by equation (55). As it would be the case 
if we measured the function h in -^1024 [^^^ panels (e) and (f) of Fig. 12] in D'f^, their 

function h is quite noisy, since the scaling as a function h is expected to be less good in this case. 
Furthermore, because of the small size of their volume limited sample, their measurement is quite 
contaminated by finite volume effects. 

5 Summary and conclusions 

We have measured the count probability in three samples, i.e., a power-law spherical cluster 
plunged in a poissonian distribution involving 32 768 objects, a fractal F generated by a Rayleigh- 
Levy random walk involving 32 768 objects, and a CDM sample E involving 262 144 matter parti- 
cles. 

We have evaluated errors due to various spurious effects, such as finite volume effects, dis- 
creteness effects and "grid" effects on the void probability (VPDF) and on the count probability 
(CPDF). Our study has been made in the framework of the scaling model. We indeed wanted to 
test the practicability and the viability of the formalism of BS. The main results are the following: 

5.1 About the VPDF and the function a 

1. The VPDF exhibits little sensitivity to finite sample effects, except at the larger scales where 
it rapidly deteriorates. The trustable scales are smaller than the scale ^cut defined by 



where Trample is the sample volume. Above this scale, the measurement of the VPDF is not 
statistically significant. For i <C icut the error on function a = n~^i~^ln Pq should be roughly 
smaller than the uncertainty on the average number density 



where ^2(-^sample) is the average of function ^(ri — r2) over the sampled volume. This error 
is expected to be very small. 

2. However, when one wants to test the scaling model, the quantity a is studied as a function of 
JVc = nl^^2- This scaling number is rather sensitive to finite volume effects. It is indeed 
likely to be systematically underestimated by a direct measurement at large scales (CBS). 
The finite volume error on this number can be approximated by 



-fo(^j -^cut ) ^ample-^ 



-3 



= 1, 



(57) 




(58) 




(59) 
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This evaluation assumes that the hierarchical model applies (see Appendix B.3). It does not 
give the amount of the systematic underestimation quoted above but rather estimates the 
uncertainty on the overall normalization of N^. 

3. In numerical simulations, a supplementary spurious effect is expected if the initial distribution 
of particles was a slightly perturbed regular pattern. This way of setting initial conditions, 
good from a numerical point of view, because it minimizes white noise at small scales, is bad 
from a statistical point of view, since it brings some unexpected correlations, that we call grid 
effects. The consequence is that the VPDF is underestimated, particularly at large scales. 
Grid effects should be negligible only if 

Po{n,l)^e-\ (60) 

which is a rather severe restriction. 

4. Once all spurious effects are known and have been isolated, the scaling model can be trustfully 
tested on the function a. The procedure consists in measuring the quantity a{n,l) as a 
function of in the sample and in various subsamples randomly extracted. This can also 
be done analytically, using equation (27). Then, if the scaling relation applies in a given 
scale range, that can be determined by carefully measuring the low-order correlations (CBS, 
§ 4.2.1), an unique function (j{N^ can be determined. 

5. To have significant information on the asymptotic behavior of function tr, larges values of 
are needed. We find with practical measurements that should at least verify 

iVc(4) ^ 30 - 40, (61) 

where Iq is the correlation length, for which i^^^o) = 1. In terms of galaxy number density, 
this gives n ^ 0.02 — 0.03/i'^ Mpc'^. This number is hardly reached by current volume limited 
three-dimensional galaxy catalogs. This strongly argues against sparse sampling strategies. 

5.2 About the CPDF and function h 

If the scaling relation applies, the function h(^N,n,l) = N'^n~^l~^Pi^{n,l) should scale as an 
universal function h(^N/Nc), that is characteristic of the underlying continuous density field and 
has some specific properties computed by BS. However, this behavior is only asymptotic, for a 
large number of sampling points, and the function h(^N,n,l) is never exactly equal to function 
h(^x), which makes the detection and the measurement of function h somewhat delicate. 

1. A small N , discreteness effects dominate so iV ^ 1 is required. Moreover, underdense regions, 
that have a different scaling behavior contaminate the measurement, so iV ^ iVv is required. 
But such constraints are difficult to follow in realistic samples. Fortunately, BSD found a way 
to correct for discreteness and the influence of underdense regions, so that one can enlarge 
the available dynamic range to iV ^ 1 and N ^ N^. However, their correction needs some 
improvements (i.e., must be generalized with weaker hypotheses, Schaeffer et al. 1994). 

2. Real sample are of finite volume. The consequence is a sharp cutoff at large N on the CPDF. 
It implies that low order moments of the CPDF are likely to be underestimated by direct 
measurements. CBS have proposed a method to correct for such defects, which consists in 
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extending to infinity the exponential tail presented by the CPDF at large N . One consequence 
of finite volume effects is also a change in the overall normalization of function h, which is 
larger than expected in the vicinity of iV ~ N^. Unfortunately, the corresponding factor 
1 + ^fin between functions h and h is easy to evaluate only if the scaling relation is obeyed, 
since one has to know the function h to compute it. 

But once these defect have been carefully taken into account, it is seen that the scaling model 
formalism can be used. However, to trustfully decide if a function h exists or not, condition (61) is 
required. To measure this function without having to introduce strong corrections, we must require 

iVc(4) ^ 80 - 120. (62) 

In terms of galaxy density, we thus must roughly have n ^ 0.04 — 0.06/i~'^Mpc'^, a constraint that 
is not fulfilled by current volume limited three dimensional galaxy catalogs. 

5.3 Perspectives 

Our count probability cook book is far from being complete, since we have not taken into account 
observational effects, such as selection and redshift effects. Moreover, we have only studied the 
specific case of the scale invariant model of BS. Other models, such as the lognormal distribution 
(Coles & Jones 1991), the negative binomial model and the thermodynamic model of Saslaw and 
Hamilton (1984) also provide very good fits of the measured count probability in the observed galaxy 
distribution (Crane & Saslaw, 1986, 1988). Actually, all these models were seen by Bouchet et al. 
(1991, 1993) to become indistinguishable in the scaling regime available in the galaxy distribution. 
As argued by Bouchet et al., this is certainly because the current galaxy catalogs are too sparse 
and widely dominated their discrete nature (so conditions [61] and [62] are of course not fulfilled), 
but this issue has to be further clarified. 
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APPENDIX 

A distribution function of our fractal F 

Here we compute the statistical properties of our fractal F. At first, we compute the (J-body 
correlation functions, redoing in a slightly different way a calculation of Peebles (LSS), and show 
that this set obeys the scaling relation. Then, we try to evaluate functions cr^y) and h(^x). 

A.l Computation of the (^-body correlations 

The mean number density n oi a, real fractal generated by an infinite Rayleigh-Levy random walk 
diverges. Moreover, such a set has no correlation length, i.e., ^2 is infinite. By imposing that F is 
a replicated cubical box involving a finite number of objects, we assure the finiteness of n and ^2- 
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To test a more realistic situation, we follow LSS and assume here that our set F is generated by 
a distribution of finite random walks Wi, with a random number Ni of steps, started at a random 
place of the sample, so that the average number density nw of chains verifies n = nw- < Ni >. 

Mandelbrot (1975) and Peebles (LSS, p. 245) give the probability density /(r) of a displacement 
r after any number of steps in a given walk Wi. With the notations of § 2.3, we have 



(6+l)(6-l) 

C= ^2 tan 



f{r) = Cr-\ 7 = 3-e, e < 2, (63) 

with 

-L ^\(^ - ^\ r -n- i 

(64) 

■y/i ~ L ^ . " 

In the following, we shall use the notation 

ri,m = \ri - rm\. (65) 

We now want to compute the two-body correlation function. Let 6P be the probability of 
finding an object in both of the volume elements SVi and 6V2. The probability that the two objects 
have been generated by the same walk Wi is SPi = [/(ri^2) + f{'i'2,i)]n6Vi6V2. The first term says 
that the object 2 have been generated after the object 1 with a probability /(ri^2)^^2 (and the 
probability of existence of object 1 is nSVi), but the reverse way is possible (term with ^2,1). The 
probability that the two objects have been generated by Wi and Wj with i ^ j is 6P2 = n^6Vi6V2, 
since Wi and Wj are not correlated each other. By definition of the two-body correlation, we have 
SP = [1 + ^2{ri,2)WSViSV2 = SPi + 6P2. So we get 

<2(ri,r2) = 2Cr-J. (66) 

Let us compute the three-body correlation function. Let 6P be the probability of finding three 
objects respectively in the three volume elements 6Vi, 6V2 and 6V^. We have then three possibilities. 
The first one is that the three objects have been generated by the same walk Wi] the probability of 
such an event is 6Pi = [f {r 1^2) f {1^2,3) + eye. + lev .]n6Vi6V26V3 (the term "eye." take into account 
all the possible paths that can cross the three objects (three terms), and "rev." indicates that 
the reverse way is possible). The second possibility is that two objects have been generated by a 
walk Wi and the other object have been generated by Wj with i ^ j; the probability of such an 
event is 6P2 = [/(''i,2) + /(''2,3) + /(''3,i) + rev.]TC^^yi^F2^^3- The third possibility is that the three 
objects have been generated by three different walks; its probability is 6P3 = n^6Vi6V26V3. By 
definition of the three-body correlation function ^3, we have 6P = [1 + ^2(''i,2) + ^2(''2,3) + ^2(''3,i) + 
^3{ri,2,r2,3,r3,i)WSViSV2SV3 = SPi + 6P2 + 6P3. So we get 

6(ri, r2, V3) = [6(^,2)6(^2,3) + eye. (3 terms)]/2. (67) 

Let us compute the (J-body correlation function. Let 6P be the probability of finding N objects 
respectively in the Q volume elements 6Vi,...,6Vq. As we did for the two-body and the three-body 
correlation functions, we have to find all the possible combinations of sets Wi and paths in Wi. 
Let V be the set of partitions of the set {!,..., (J}. An element of V is written i = (ii,...,is), 
with ik = {^fc,!, 2fc,card(ifc)}- With these definitions, we thus have ik H ii = 9 ii k ^ I, and 
ii U ... Uis = {1, Q}. In the following, we use the notation card(ifc) = c^. Generalizing the above 
procedure, we find 

SP = SVr...SVQ En E x-x4,-i,.=,> (68) 

{il,...,is)eV k=l permutations j[ik] 
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with 

J2 ^fouj2 X - X = ^' (69) 

permutations j\iy.\,cy. = l 

and fi^rn = f{n,m)- Now we have 

s 

6P^v96V^...6Vq J2 n^c,K,,...,r,,,J, (70) 

{il,...,is)eV k=l 

with ^1 = 1. It is then easy to recursively compute the (J-body correlation function: 

^Q(ri, rq) = [6(n,2)6(''2,3)---6(''g-i,g) + permutations in {1, Q}, (i.e., Q\ terms)] /2'5"\ 

(71) 

This result was already derived (in the framework of the hierarchical model) by Hamilton & Gott 
(1988). Our set therefore obeys the scaling relation (2). 

A. 2 Computation of the count probability 

In the following, we assume that the cells are spherical. To compute the function Pn{1), we need 
at first to calculate the averaged (J-body correlation function $q{1), given by equation (3). We will 
then be able to compute functions cr^y) and h(^x). 

A. 2.1 Averaged (J-body correlations 

Equations (66), (71) imply 



with 



W = Sq^2~\ (72) 



Sq = Sq^q, (73) 
Sq = 2'-'^Q\, (74) 

= (75) 



and S is the sphere of radius unity. The value of $g should be close to unity, as argued by BS 
and Bernardeau & Schaeffer (1992, hereafter BeS), because the correlation function structure (71) 
obeys the hierarchical model considered by these authors. In spherical coordinates, equation (76) 
can be written (7 < 2 and (J > 3) 

Q /■! 



with 



2 /3\ t 

= [(2-7)(3-7)]Q-i (2; Jo ^^2-c?rQ_i/i(r2)/2(r2,r3).../2(r3,rQ_i)/i(rQ_i), (77) 

/i(r) = (1 + rf--^ - (1 - rf--^ " ^ [(1 + ^f"" " (1 " ^f""] > (78) 

/2(ri, r2) = (3 - 7) [(ri + r2f-^ - \n - ^2!'"^] • (79) 
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J2 can be analytically estimated (LSS) and is equal to 72/[(3 — 7)(4 — 7)(6 — 7)2"^]. J3 can be 
reduced to an uni- dimensional integral, J4 and J5 to bi- dimensional integrals, Jg and J7 to tri- 
dimensional integrals, and so on. We numerically estimated Jq for Q < We take 7 = 1.8, as 
chosen to construct our fractal F. The values are listed in Table 1. At low-order, we see that $g 
is close to unity, but $g — 1 increases with Q. 

A. 2. 2 Evaluation of function (j{y) and function h{^x) 

Using equations (4), (10) and (5), we write (j{y) as the series expansion 

00 n 

<y)=T.^-ir-'-^y-' (80) 

Ar=i 

with 5i = ^2 = 1. For our fractal, this expression reads 

a{y) = ao{y) + 6a{y), (81) 
with ^ 

^o{y) = (1 + I) > (82) 

00 

M2/)= E(-2)'"'^(*iV-l)2/^-\ (83) 
Ar=i 

and $1 = 1. 

The problem is now to evaluate 6a{y) = 0.028(2//2)2 - 0.064(2//2)^ + 0{y^). In the regime 
2/ <C 1, we have, as expected, |^C7(2/)| <C (^o{y), which is much less obvious for y ^ 1. 

However, the quantity $g seems to be well approximated by the following phenomenological fit 

$Q ~ $3<^'^"^ 3 < g < 6, (84) 

with $3 ~ 1.028 and 6 ~ 1.036 (see Table 1). Let us suppose that, in a first approximation, 
equation (84) is valid for any Q. With the above assumptions, we have 

$3/^^ , $3^ $3^2/ f^r,^ 

(J[y) — ; — h (1 — ^) - (1 t)-- (85) 

With our numerical values we find that 

, , 0.958 , , 

c7(2/)~ + 0.0422 - 0.00392/. (86) 

1 + 2//1.93 ^ ^ ' 

This expression is practically valid for y ^ 10. In this regime, we have |^C7| <C uq- 

But equation (86) was derived in a somewhat ad-hoc way and it does not sample well the large 
values oiy. A better way of evaluating the difference between a and ctq consists in using the method 
explained in appendix A of BeS. This method is valid here since our fractal obeys the hierarchical 
model used by BeS. Let us rewrite their equations (A13), (A14), (A15), (A16) as 

a{y) = - I d\s{y), (87) 
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where i; = ^'^ is the volume of a (spherical) cell centered on origin, 

.(r) = C[r(r)]-lr(r)C'[r(r)], (88) 

where C'{t) = d(/dT, 

r(r)=-l fd\'^-^ar(r')). (89) 

The function Cij) depends on the hierarchy of the Q-ho&j correlation functions. Using equations 
(2), (4) and (5) of BeS, we find, for our fractal, 

C{r) = l-r+K\ (90) 

We now want to show that function (j{y) is not very different from (Jo{y) for any value of y. Using 
(90), we see that function s obeys the implicit equation 

.(r) = 1-11 d\'^-^s{.'). (91) 

It is then useful to write s(r) and ^2(1", r') in a perturbative way: 

5(r) = so + 6s{r), (92) 

6(r,r')=?2[l + '^^(r,r')]- (93) 
Using equations (87), (91), one of course gets 

50 = (94) 

^.(r) = -f / d\'Ss{r') - f ao / d\'Sar,r') - f / d\'Sar,r')Ss{r'), (95) 

Jv Jv Jv 

6a = -f d^r6s(r). (96) 

V Jv 

Equations (95) and (96) permit to recursively compute Sa: 



A2S(y)^ 
1 + f + A2S(2/)f 



6a{y) = ao ' (97) 



where 

AT 



00 A / \ rJ 

.W^E^(f) , (-) 



Ar=o 

= J dW..d\^+sSUri,2)..M2{rN+2,N+3) = E ( - + 2 - fc) ^^+^-^ " 

" k=0 ^ ' ' 

(99) 

Numerically, we find ^(t/) = l + 0.71(2//2) + 0.54(2//2)2+0.36(2//2)3+O(/) and A2 = $3-1 = 0.028. 
If we neglect the very improbable case in which we would have S(2/) — — (8/A2)(l + yl2)y~^ for 
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Table 1: Values of $g, (J < 6 compared with $3^'^ ^ for 7 = 1.8. 



Q $Q $31-036^-^ 

3 1.028 1.028 

4 1.064 1.065 

5 1.103 1.103 

6 1.145 1.143 



2/ ^ 1, we see through equation (97) that a should be always very close to ctq, even when y is 
arbitrarily large, which is confirmed by Fig. 6. 

In this particular example, the approximation $g ~ 1 is thus very good. It is now easy to 
compute function h{^x) from equations (11), (82): 

/i(a;) ~ 4exp(-2a;) (100) 

The above reasoning can also be applied to cubical cells (the quantities $g are simply slightly 
different), for which equations (82) and (100) also apply. 



B Misleading effects on function a 

B.l Grid effects and void probability in numerical simulations 

From a pure statistical point of view, the discrete realization of a smooth continuous density field 
should be locally poissonian. On the other hand, it is useful to start a simulation from a slightly 
perturbed regular pattern of particles in order to reduce the poissonian noise at small scales, which 
is contradictory with the statistical vision. For instance, our CDM sample comes from a 64 X 64 X 64 
initial grid of particles. Now underdense regions, where shell-crossing is not likely to take place, 
may have conserved the information linked to the initial grid, as can be seen on left panel of Fig. 1. 
Let us see if this modifies the behavior of the VPDF. 

To do that, let us dilute E in two different ways (a) and (b), the first one preserving the possible 
information associated to a grid, the second one destroying this information. The method to obtain 
(a) consists in diluting E so that the remaining particles form, at the beginning of the simulation, a 
(slightly perturbed) regular pattern, i.e., a m X m X m grid. We call such a dilution a preventive one, 
since it preserves all the properties of the particle distribution (but not all the information). To 
obtain (b), we simply have to randomly dilute E. To significantly destroy the possible information 
linked to the initial grid, the dilution factor n/ni = 6A^/m^ must be important, but not too large, 
otherwise the subsamples (a) and (b) will become indistinguishable, because both dominated by 
discreteness (i.e., a ~ 1). We took n/rii = 8 (so m = 32). We thus extracted a subsample -B32768 
obeying requirement (a) and three subsamples -B32768) 3 — 1)2,3, obeying requirement (b). 

Figure 13 gives it as a function of as measured on -B32768 (solid curves), E (dashed curve) 
and -Bf2768 (fiUed symbols). Clearly, the functions a measured on -Bf2768 °^ -^32768 differ at 
large (so at large scale). 

One can evaluate the scale for which the random dilution differs from the preventive one, by 
comparing the VPDF of a poissonian set with the VPDF of a grid of same average density. In the 
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first case we get 

Po'^"(k,^) = exp(-TC£^). (101) 

In the second case we have 

P^"'^{n,l) = max{l-nl^,0). (102) 

Phenomenologically, we see that the statistics of a grid becomes quite different from the statistics 
of a pure random set when nl^ > 1, that is when Pq^"(£) < 1/e. This simple comparison also 
explains why the function a measured in -Bf2768 smaller than the one measured in -B32768) since 
a random dilution leads to a greater void probability than a preventive dilution. 

Let us look now at the more complicated case of our CDM sample, which is a discrete realization 
of a continuous underlying density field that can be described by a density contrast ^(r). For a 
"large enough" scale, the void probability will be determined by underdense regions, in which ^(r) 
is expected to present smooth variations and to have conserved the information on the initial grid. 
Equations (101) and (102) then become 

P^^'^inJ) ~ 1 / exp(-[l + S{r)]ne)d^r, (103) 
V Jv 

Pt\n,l) f max(l - [1 + 6{r)]nl^,0)d\. (104) 

V Jv 

Pq"'^ vanishes when [1 + ^(r)]^^'^ > 1 for any r. Then, Pq^"(£) < 1/e. Therefore, it is reasonable 
to think that when the measured void probability is less than 1/e, or equivalently, when 

iVv > 1, (105) 

grid effects can significantly affect the measurements. In Fig. 13, the points verifying iVv > 1 are 
circled (expect for E). 



B.2 Finite sample effects and void probability 

B.2.1 Calculation of the error due to the sample finiteness 

We aim here to estimate the error or the VPDF associated to finite sample effects. Let us consider 
a set 5sub of finite volume V, involving a finite number iVpar of objects. Let us assume that this 
set is a subsample extracted from an infinite set 5inf of average number density n for which the 
hierarchical model applies. In this case, the (J-body correlation function can be written as a sum 
of products oi Q — 1 terms in ^2 (see Bernardeau & Schaeffer 1992, hereafter BeS) 

^Q(ri,...,rQ)a^ n 6(r.,r,). (106) 
<3-i 

This expression is a particular writing of the scaling relation. We will assume, in the following that 
the cells are spherical of volume v = 4Trl^/3 <C V. 

The VPDF is practically computed by randomly throwing cells in 5sub and by counting the 
fraction of empty cells. Let Pq be a statistical indicator of Pq. It can be written 

Po = -^J2^d{N.)^ (107) 

<^tot • 
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Figure 13: Logarithm of tr as a function of log;^o -^c as measured in E (dashed curve), in three 
samples -B32768) j — 1)2,3 (solid curves) and in -B32768 (fiUed symbols). Except for the dashed 
curve, the points verifying Pq < 1/e are circled. 

where Ni is the number of objects in the cell i, 1 < i < Ctot, and 6d{N) is the discrete Dirac 
function that gives 1 if iV = and otherwise. We assume here that the error related to counts 
in cells is not significant, or in other words that Ctot is very large. The real VPDF will be the 
ensemble average of Pq on an infinite number of realizations 5inf : 

Po =< Po >ens . (108) 

The standard deviation on Pq is written 

{APoY =< Po' >ens - < Po >e'ns • (109) 

We have 

< Po >ens = < ^J2^d{N,)6d{N,) >ens • (HO) 
tot i,j 

This quantity is, in the limit Ctot ^ 1, nothing but the probability Pofi(^n,l,V) that two cells 
thrown at random in 5sub are empty. We can write 

Po,o{n,l,V) = ^ j^Po,o{n,l,n^2)d?nd?T2, (111) 

where Pofl{n,l,r) is the probability that two cells of volume v = separated by a distance r are 
empty (isotropy is assumed) and ri^2 = |ri — r2|. 

Using the results in Appendix C of BeS, we can estimate Pofi(^n,l,r) in the case r ^ I (or 
equivalently ^2('') ^ ^2)- With the notations of BeS, we have 

Po,o(ri,Ar) = x(0,0). (112) 

From their equations (A21), (C17), (C18), (C19), we can easily find 

Po,o(?i,^,r) = exp{-27ii;[(T(iVc) + W(iVc)6(r)]}, r>£ (113) 
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where a\Nc) = da/dNc- When r ^ 21, the estimation of Pofi(^n,l,r) is rather complicated, since 
the two cells Ci and Cj are connected, forming a non-spherical cell of volume 



Iv- f 
Jc 



dnCi 



^ 3 r 1 /r 



(114) 



But let us assume, to simplify the calculation, that the VPDF does not strongly depend on the 
shape of the cell (which is true in the poissonian case). Then, we can write 

Po,o{n,l,r)^ Po{n,l\r)), r ^ 21. (115) 

The integral (111) then becomes 

Po,o(ri,AF)~ — / Po{n,l'{r))r^dr + -^ exp \-2{nvfa'$2{n,2)} d\rd\2. (116) 

V Jo V ^ Jri^2k,2i 

The first integral and the second integral of the right hand side of this equation will be denoted 
by Ii and 12- Let us assume that ^2('') is a power-law of index —7 where 3/2 < 7 < 3. I2 can be 
approximated by 



(117) 



where $2{^) is the averaged correlation over the sample: 

UL) = ^ jj2{n,2)d\rd\2 



and 

Integrating (117) leads to 



81; 



r] = -2{nvfa\N,). 

47r(2£)3 [7?6(2^)] 



AT 



V 



E 

N>2 



(7iV - 3)iV! 



(118) 
(119) 

(120) 



In the case 1 < 7 < 3/2, the result is similar. The difference is that a supplementary term 
-Pq /r-i 2^2£ ^2(''i,2)'^'^''i'^'^''2 replaces the N = 2 term of the sum in right hand side term of 
equation (120) that starts at iV = 3 instead of iV = 2. In the following, we still assume that 
3/2 < 7 < 3, although we think that the final result can be reasonably generalized for 1 < 7 < 3/2. 

We now want to find reasonable estimates of Ii and I2. To do that, we consider two extreme 
cases. The first one (i) corresponds to the regime when Pq ~ 1, or equivalently nva ~ 0. The 
second one (ii) corresponds to the regime when Pq <C 1 or equivalently nva ^ 1. We will then 
consider two subcases (a) and (b) in (i) and (ii), respectively the "poisson limit" iVc <C 1 (c ~ 1) 
and the asymptotic regime (22) expected when iVc ^ 1. 

(ia) Pq ~ 1, iVc <C 1: in this case, it is easy to compute Ii and I2 at first order in and nva. 
We find 



h 



nv 

Po + ^ 



(121) 
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Since ct ~ 1 — iVc/2 when iVc <C 1 (see eq. [80]), we have ct' ~ —1/2 and 

h 



1 - - + {nvYUL) 



^0 ■ 



(122) 



(ib) Po ~ 1, iVc ^ 1: the calculation of Ii at first order in nva leads to 



anva 



(123) 



where a is a factor depending on the values of w and 7, but it is numerically seen to be close 
to unity (or of the order of a few). We can also write 



7?6(2^) ^ 2u;nva{N,)S^, > 1. 
So 7/^2(2^) <C 1 and I2 is given at first order in nva by 



1 - - - 2{nvfa'UL) 



(iia) Po <C 1, <C 1: The computation of Ii in this regime is easy. One can find 

128/9 



(124) 



(125) 



Po 



(126) 



We have 7/^2(2^) oc nvN^ and nva ~ tci; ^ 1. Thus, the first order dominant term in equation 
(120) is also compatible with equation (122) when nvN^ <C 1 (in this case, /2 — [1 — 8'!;/F]Po 
since i^^-^) ^ ^2^^) ^ When nvN^ ^ 1, we obtain similar conclusions as in item (iib), 
but the quantity B defined hereafter is negligible compared with /iPq-^ - 8(i;/F). 



(iib) Po < 1, iVc > 1: /i is written 



h 



Po 



128/9 



8{Pnva)^ 



V 



Po, 



where /3 = 1 — (3 — 7)^*^/3 is close to unity. I2 is given by 



1 - - + vUL) + B 



) 



where B is of the order of 



B 



V exp(7?6(2^)) 



(127) 



(128) 



(129) 



V 7^6(2^) ■ 

With equation (124) which is also valid in this regime, we see that BPq should be of the 
same order as Ii — 8{v/V)Pq (we cannot make more detailed comparisons, since we are using 
rough approximations). 
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In the regime (i) or nva ~ 0, we thus have 

(APo)2 ~ aanv^Po - 2{nvfa%{L), Pq ~ 1. (130) 
Numerical calculations moreover show that in the regime nvai^N^ ^ 1, /i is well approximated 

by 

h 



yPo, Po k 1/e, (131) 



where a is a factor (not necessarily constant) of the order of unity or at most a few unities (and 
a = 1 exactly in the Poisson case). When Pq is very small, this expression is of course not valid, 
and a logarithmic correction should be taken into account [i.e., a term (^nva)~^ = — (InPo)"'^, see 
equations (126), (127)], but at the level of approximation we are, we shall forget it and we shall 
take a = 1. 

The expected error on function Pq should therefore roughly be 

" - ^(--r-WUL). (132) 

Of course, this expression is compatible with equation (130). 

The first right hand side term of equation (132) should be valid even if the hierarchical model 
does not applies. Indeed, it simply says that the number of independent cells in the sample is V/v. 
The second right term is however more questionable. An alternative calculation of this term was 
made by Colombi (1993), who did not assume anything about the scaling behavior of the underlying 
distribution, but used a scale reasoning, the spherical top hat approximation and the fact that the 
distribution was initially gaussian. We simply quote here the result 

"^"^'^ ^{nvfa''l'%{L), (133) 



which is quite similar (but not equal) to the second right hand sided term of equation (132). 
B.2.2 Rare voids, the largest void 

In the approximation (132), we see that the error becomes always larger than unity above the scale 
IcvLt defined by 

Po(?i,4ut)W^(4ut) ~ 1. (134) 
This is to be compared with the certainly more accurate estimate (coming from eq. [127]) 

Po(7i,4ut)W^(4ut) ~ 14.2[-lnPo(ri,4ut)]"^ (135) 

Indeed, this equation can be considered as a generalization of a result of Polizer & Preskill (1986), 
who computed the number Avoids of "rare" void cells in a poissonian sample of volume V . Their 
result is 

<o1dr = vM'Po, Po«l. (136) 

There is typically only one void when Po[n,i)V/v ~ [nv)~^ = [— In Po(tc, ^)]~'^, which is quite 
similar to equation (135) if we forget the factor 14.2. Equation (136) may be thus generalized for 
a clustered distribution of points as 

Tivoids ~ -[-lnPo]^Po, Po < 1. (137) 

V 
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For our CDM sample, we obtain log;^o-^cut — —1.3 from equation (134) and log;^o-^cut — —1.25 
from equation (135), which are quite similar values and that are moreover in agreement with 
Fig. 4. Indeed the far right point of this figure corresponds to the measurement of function & in E 
for log;^o-^ = —1.2. It is abnormally shifted forward, meanwhile the next point on the left, which 
corresponds to log;^o-^ = ~l-4, seems to be good. 

When I ^ lent, two cases can be considered: 

1. If the largest void is slightly too large in comparison with its expected size (that can be 
obtained by making an average on numerous realizations of volume V), the VPDF is artifi- 
cially overestimated when I > Icut, presents at larger scale an abrupt cut off and vanishes at 
•^cutoff > -^cut- In this case, the measured function a presents a cutoff between ^cut and ^cutoff) 
then increases suddenly toward infinity when I tends to ^cutoff- This is illustrated by Fig. 6 
(but our logarithmic bin is too large to see the increase on a). 

2. If the largest void is slightly too small in comparison to its expected value, then the VPDF 
presents only an abrupt cutoff between ^cut and ^cutoff) which means that the measured & 
increases rapidly to infinity when I — > -^cutoff- This is the case of E, which is contaminated 
by grid effects that tend to decrease the expected size of the voids. 

B.2.3 Example 

Let us see now, through a simple example, if the estimation (132) of the error on the function Pq 
is realistic. We have extracted from our CDM sample eight adjacent cubical subsamples of size 
-^sample = -^box/2, in which we have measured the function a[n,i) = —ln[Po)/ni^. Figure 14 gives 
the logarithm of it as a function of log;^o(-^) (triangles refer to the main sample, and curves to the 
subsamples). The far right square indicates that |APo/Po| > 1. To estimate ^2 (-^sample) we have 
directly measured it in our CDM sample, and we find ^2(-^box/2) ~ 0.04. The errorbars in Fig. 14 
represent the expected error on function & (using eq. [37]). They seem to be of the appropriate 
size, although slightly too large. But we must be aware of the fact that equation (132) is rigorously 
valid only if the hierarchical model (106) applies, which is not obviously the case of our CDM 
sample, that for example exhibit slight deviations from the scaling relation at large scales (see § 3 
and § 4.2.1). 

B.3 Finite volume error on ^2 

In the following, we neglect discreteness effects. Practically, the two-body correlation is computed 
in a sample of finite volume V, i.e.. 



where eg^^ = (cos ^ sin ^, sin ^ sin ^, cos ^) and ^(r) is the density contrast of the underlying dis- 
tribution. In the following, we shall denote by < X > the ensemble average of quantity X . For 
example, the real two-body correlation of the studied infinite set is 




(138) 



6 =< 6 > • 



(139) 



The average standard deviation is 



(A6)' =< (6- < 6 >)' > • 



(140) 
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Figure 14: Logarithm of tr as a function of as measured on our CDM sample E (triangles) and 
eight adjacent subsamples of size Xbox/2. The errorbars are given by equations (37), (132). The 
far right square indicates that |APo/Po| > 1. 



So we have 

(A6)' = <^(x + ri),^(x)%+r2)%)sin^ic?^ic?</.sin^2C?^2C?</'2C?'2;A) -d, (141) 

with Ti = reg.^fj,., i = 1,2. If we invert the ensemble averaging and the integral, we get 



(A6)^ 



J ri,x,y + r2,y) 



(47rF)2 

+6(|x-y + ri -r2|)6(|x-y|) 

+6(|x - y + ri|)6(|x - y - r2|)]sin eidOidcfii sm 9 2(19 2(1(1)2(1^ xd^y 



(142) 



If the factorized hierarchical model applies [so the system obeys the scaling relation (2)], ^4 can be 
written (Fry & Peebles 1978, Sharp et al. 1984) 



^4(1, 2, 3, 4) = ii;a[6(l, 2)6(2, 3)6(3, 4) + eye. (12 terms)] 
+ii;b[6(l, 2)6(1, 3)6(1, 4) + eye. (4 terms)]. 



(143) 



Equation (142) is now a sum of various integrals over products of 6- The dominant term is, for 
jV small enough [more precisely for {r^ IV){1 + 6) ^ ^2(-^)] 

(A6)' = ^R^ilUL)- (144) 
In the observed galaxy distribution, we have = (12i2a + 4i2b)/16 ~ 2.9 ± 0.5 (Fry & Peebles 



1978), so the relative error on 6 is approximately A6/6 ~ ^■^\/^2{^) 
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— — 3 

We can also evaluate Q4 as a function of ^4 = ^4/^2- Using the approximation ^4 ~ I6Q4 (see, 
e.g., BS), and ~ Rh, we find 

(145) 

C Contamination effects on function h 

In the following, we assume that the scaling relation applies. The domain of validity Dh (eq. [41]) 
of equation (9) is asymptotic. So practically, h never scales exactly as a function h. Moreover, 
some misleading effects can be brought by the finiteness of the sampled volume. So h can be given 

by 

h{N,n,l)= U{N,l,n).h{x), (146) 

where U{N,l,n) is a factor which tends to unity when (^N,l)eDh and when the size of the sample 
tends to infinity. It is then useful to separate the different contamination effects that may intervene. 
So we write 

U{N,l,n)=[l + Aei,,{N,n,l)].[l+A{i^{N,l)]. (147) 

The term Adis{N, n, I) is related to discreteness effects and the contamination by underdense regions 
The term A{in(^N,l) is linked to finite volume effects. These two effects are mixed up, so it is 
rigorously impossible to treat them separately. Our aim here is simply to roughly estimate them in 
order to understand the main features of the deviations from the theory brought by these defects. 
For a detailed evaluation of Adis{N, n, I) as a function of iV, we refer to the appendix of BSD. Note 
that, to estimate Adis{N,n,l), BSD assume that the function (T(iVc) has reached its asymptotic 
power-law behavior at large N^. In § C.l, without making such hypothesis, we roughly estimate 
some average of the number A^is by using moments of the CPDF. In § C.2, we use a similar 
reasoning to estimate the change on the overall normalization brought by finite volume effects. 

C.l Contamination by the finiteness of the available scaling range 

Let us assume that our sample is of very large volume, so that finite volume effects are negligible. 
The calculation of the moment of order two 1/2 = ^'^Pn of the function Pi^{n,l) provides 

00 

z/2 = nl^ {N/N^fh{N,n,l) = nl^{l + + nl^), (148) 
Ar=i 

The above sum is dominated by values of N around N^. When > several unities, equality (148) 
is dominated by values of N larger enough so that the above sum can be replaced by an integral. 
We can thus write, if > several unities, 

poo 

U2 = iViVc / [1 + Adis{N^x,n,l)]x^h{x)dx. (149) 
Jo 

With the definition 

, , _ AdUN^x,n,l)xQh{x)dx 
< AdUn,l) >Q= jo^Mi^x ' 

we finally get 

<AdUn,i)>2=^ + ^, (151) 
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since function h(^x) has to obey the normalizations (46). 
Equivalently, we have at first order in l/N^ and 1/^2- 



<AdUn,l)>3 ~^ 

<Adis{n,i)>i 

<Adis{n,i)>5 



1 



i2 



1^ , 1+0.75/53 
_2_ I 1 + 253/54 



(152) 



The calculation of < Adis{n,l) >q for our CDM universe provides decreasing values oi Q, which 
indicates that Adis{NcX,n,l) should be a decreasing function of x. Indeed, the larger Q, the larger 
are the contributing values of x to the moments of order Q of the count probability. The variations 
of < Adis{n,l) >Q with Q depend on the degree of clustering of the system. For example, our 
fractal F, which is much less correlated than E in the sense that its Sq increase much less rapidly 
with Q, has < Adis{n,l) >q increasing with Q. The dependence on N of A^is is however complex. 
It has been thoroughly evaluated by BSD in the regime when a reach its asymptotic power-law 
behavior. The calculation of BSD also includes higher order terms in l/N^ and l/$2 than equations 
(151) and (152). 

Nevertheless, at cc ~ 1, which should be in the vicinity of the maximum of the function x^h(^x), 
we should roughly have 

Adi,{N,n,l)^ ^ + ^. (153) 



Thus, the shift between function h and function h around cc ~ 1 should increase while the sample is 
diluted (because is proportional to the number density n), in qualitative agreement with figure 
12. Equation (153) also indicates that if we cannot both have iVc > 1 and ^2 > 1 iii the considered 
sample, the study of function h(^x) becomes difficult and possibly meaningless. 

C.2 Finite volume effects 

We assume here that our sample is of finite volume, but also that the above effects are negligible, 
i.e. that we have Ic <^ I <^ lo and iV ^ 1, iV ^ N^. Because of the sampled volume finiteness, 
the maximum number iVmax of objects that can be found in a cell is necessarily finite, smaller than 
iVpar, the total number of objects in the set. So the second order moment of the count probability 
is (in the continuum limit, i.e., I ^ 1^) 



V2 = iViVc = V N^Pn = NN^ / [1 + A{in{xN^,n,l)]x^h{x)dx, 

N=l 

where iVmax is the maximum number of objects per cell. So we find 

^AT^ax/ATc 

10 

Now we know that function h obeys normalizations (46), so the quantity 



Jo 



[1 + A{in{xNc,n,i)]x^h{x)dx = 1. 



(154) 



(155) 



1+ < Afin(^) >Ql 



L 



AT^ax/ATc 



[1 + A^{xN^,n,l)\x 



'^h{x)dx / 



AT^ax/ATc 



x^h{x)dx (156) 
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is simply, when Q = 2, 



1+ < A^{1) >2= 1 / x^h{x)dx > 1. (157) 

Since function x^hi^x) has its maximum at the vicinity of cc ~ 1, we can expect that for cc ~ 1, 
Afin(a;iVc, ^) ~< Afin(-^) >2- But this is true only if iVmax is large enough in comparison with 
so that Nc is not affected by finite sample effects when I < Iq (which is indeed the case in our 
CDM sample as shown by CBS). The same calculation can be made for (J > 3. However, when 
(5 > 3, one rather tests finite volume effects on quantities Sq (see, e.g., CBS) than a difference of 
normalization between functions h and h. Indeed, the maximum of x^h[x) increases with Q, so is 
likely to be of the order of N^ax/^c or even larger. 

With regard to our CDM sample, we find that < A{in{l) >2< 0.12 when log;^o(-^) ^ —1.6, and 
< Afini^l) >2> 0.26 when log;^o(-^) ^ —1.2. So the shift between h and h brought by the volume 
finiteness is not negligible, and should even more significant for systems having more power at large 
scales, such as hot dark matter numerical samples. 
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